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Self Organizing Map algorithm and distortion measure 

Abstract 

We study the statistical meaning of the minimization of distortion measure and the relation 
between the equilibrium points of the SOM algorithm and the minima of distortion measure. 
If we assume that the observations and the map lie in an compact Euclidean space, we prove 
the strong consistency of the map which almost minimizes the empirical distortion. Moreover, 
after calculating the derivatives of the theoretical distortion measure, we show that the points 
minimizing this measure and the equilibria of the Kohonen map do not match in general. We 
illustrate, with a simple example, how this occurs. 



keywords Distortion measure, asymptotic convergence, consistency, Self Organizing Map, empiri- 
cal processes, Glivenko-Cantelli class, uniform law of large numbers, general neighborhood function. 
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1 Introduction 

The distortion or distortion measure, is certainly the most popular criterion for assessing the quality 
of the classification of a Kohonen map (see Kohonen [8]). This measure yields an assessment of 
model properties with respect to the data and overcomes the absence of cost function in the SOM 
algorithm. Moreover, the SOM algorithm has been proven to be an approximation for the gradient 
of distortion measure (see Graepel et al.[6]). 

Although the Kohonen map is proven to converge sometimes on equilibria points, when the 
number of observations tends to infinity, the learning dynamic cannot be described by a gradient 
descent of distortion measure for an infinite number of observations (see for example Erwin et 
al. [2]). Moreover, Kohonen [9] has shown in some examples for the one dimensional grid, that 
the model vector produced by the SOM algorithm does not exactly coincide with the optimum of 
distortion measure. This property seems to be paradoxical, on one hand SOM seems to minimize 
the distortion for a finite number of observations, but this behavior is no more true for the limit, 
i.e. an infinity of observations. 

In this paper we will investigate the relationship between the SOM and distortion measure. 
Firstly we will prove the strong consistency of the estimator minimizing the empirical distortion. 
More precisely, we will prove that the maps almost minimizing the empirical distortion measure will 
converge almost surely to the set of maps minimizing the theoretical distortion measure. Secondly, 
we will calculate the derivatives of the theoretical distortion, and deduce from this calculation that 
the points minimizing the theoretical distortion differ generally from the equilibrium point of the 
SOM, whatever the dimension of the grid. Finally we will illustrate, with a simple example, why 
an apparent contradiction between the discrete and the continuous case occurs. 
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2 Distortion measure 

We also assume in the sequel that the observations ui are independent and identically distributed 
(i.i.d.) and are of dimension d. We assume that the observations lie in an compact space, therefore, 
without loss of generality, they lie in the compact space [0, l] d . We assume also that these obser- 
vations follow the probability law P having a density with respect to the Lebesgue measure of M d , 
this density is assumed to be bounded by a constant B. In the sequel we call centroid a vector of 
[0, l] d representing a class of observations u. We adopt in the sequel the notation of Cottrell et al. 

m 

Definition 2.1 For e £ N* , e < d, we consider a set of units indexed by I C Z e with the neighbor- 
hood function A from I — I := {i — j, i,j £ 1} to [0, 1] satisfying A(k) = A (— k) and A (0) = 1, 
note that such neighborhood function can be discrete or continuous. 

Definition 2.2 Note ||.|| the Euclidean norm, let 

Dj := jx := (xi) ie i G ([0, l] d ^j , such that \\xi - Xj\\ > 5 if i ^ j\ 
be the set of centroids Xi separated by, at least, 5. 

Definition 2.3 if x := (xj)j g / is the set of units, the Voronoi tessellation (Cj (x)) iGl is defined by 

Ci (x) := |cj G [0, l] d | \\xi — u>\\ < \\xk — u>\\ if k ^ i | 

In case of equality we assign uj £ Ci (x) thanks to the lexicographical order. Conversely, the index 
of the Voronoi tessellation for an observation u) will be defined by 

C~ 1 (oj) = i £ I , if and only if uj £ Ci{x) 
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Definition 2.4 distortion measures the quality of a quantification with respect to the neighborhood 
structure. It is defined as follows: 

• Distortion for the discrete case (empirical distortion): We assume that the observations are 
in a finite set = • • • , co n } and are uniformly distributed on this set. Then, distortion 



iei u>eCi(x) \jei J 

• Distortion for the continuous case (theoretical distortion): Let us assume that P is the distri- 
bution function of the observations. The theoretical distortion measure is 



As mentioned before the distribution P has a density with respect to the Lebesgue measure 
bounded by a constant B > 0. 

The distortion measure is well known to be not continuous with respect to the centroids (xi)i<=i 
for the discrete case. Indeed, if an observation is exactly on an hyperplan separating two centroids, 
shifting one of the centroids will imply a jump for the distortion. So, the distortion is not continuous 
and, in general, a map which realizes the minimum of the empirical distortion, does not exist. 
However, if we consider the sequences of maps x n such that the distortion V n (x n ) will be sufficiently 
close to its minimum, then we will show that such sequences of maps x n will converge almost surely 
to the set of maps which reaches the minimum of the theoretical distortion measure V(x). 



measure is 
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3 Consistency of the almost minimum of distortion 



This demonstration is an extended version of Rynkiewicz [TT] . It follows the same line as Pollard 
|10| . so we will first show a uniform law of large numbers and then deduce the strong consistency 
property. 



3.1 Uniform law of large number 



Let the family of functions be 



G := ^ 



g x (u) :=VA {C-\u) - j) \\ Xj - u,|| 2 for x G D\ 



j el 



In order to show the uniform law of large numbers, we have to prove that: 



sup 

x£Df 



g x (u;)dP n (uj) - / g x (uj)dP(uj 



a.s. 

n— >oo 



(2) 



since, for all probability measure Q on [0, l} d : 



I g x {u)dQ{u) = / ^ A [C-\u) - j) \\ Xj - ui\\ 2 dQ{u) = ~ £ A(* - j) f \\ Xj - ufdQ{u) 

(3) 

Now, a sufficient condition to verify the equation j2]) is the following (see Gaenssler and Stute [5]): 
Ve > 0, Vxo € Df a neighborhood S(xq) of xq exists such that 

g X0 {io)dP{to) - e < / ( inf g x (u) ) dP(u) < / sup g x (u) ) dP(u) < / g Xo {uo)dP{oj) + e 
J J \xes(x ) J J \ x&S (x ) J J 

(4) 

First we prove the following result, using a similar technique as the proof of lemma 11 of Fort and 
Pages [3] 



SOM and distortion measure 



7 



Lemma 3.1 Let x G Dj and A be the Lebesgue measure on [0, l] d . Note E c the complementary set 
of set E in [0, l] d and \I\ the cardinal of set I. For < a < |, let 

U°-(x) = {to G [0, l] d /3y G Df,Xj = yj if j ^ i and \\xi - yi\\ < a and u G Cf(y) n Cj(x)} 

be the set of u changing of Voronoi cells when the centroid xi are moving a distance of at most a. 

Then 

sup xEDj X (U?(x)) < (|/| " 1) + «) (V2) d ~ 1 (5) 

proof Let x and y G Dj checking the assumption of lemma 13.11 and j / i G I. In order to 
prove the inequality, we have to bound the measure of u belonging to the cells Ci(x) and Cj(y) 
simultaneously, since (Ci(y)) c = {jj eI j 7 L i Cj(y). 

Note (z It), the inner product between z and t, and ~rt l J := x \ { . The parameter vector 

I [ 3 ~ ~~ i 1 1 

x + 71 ~ft % l will be the vector with all components equal to x except the component % equal to 
Xi +7i 7?^'. 

Since ||y, — sc,-|| < a, we have (yi — Xi | ~rt l J) = 7i with |7i| < a < |. As the Lebesgue measure 
(of of all plane sections of [0, l] d is bounded by (\/2) 1 , when there is a movement of the 

centroid x%, of 71 r?^, the Lebesgue measure of u changing of Voronoi cells is then bounded by 
^(V2) d -\ S o 

A (Cj (x + 71 rtf) n < a (V2) (6) 

s 

Moreover, we note that x + 71 ~ft l J belongs to Dj . 

On the other hand, let yi — Xj — 71 "r?^ := 72 with ||~r*!j?[| = 1, be the orthogonal component 
to ~rt l J of the movement of Xj to y^, i.e. such that (7?J? |"t*^) = 0. 

6 

As it is shown in figure JTJ), in dimension 2, for all x' = x + 71 r?3? G Z)| , the Lebesgue measure 
of cj changing of Voronoi cells for a movement of centroid a^, of 72 7^' is bounded by ^ (>/2) . 
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Therefore, we have 

A {Cj {x + 7l r?j? + 72^') n d(x)) < a (v 7 ^ + ^ (v^)^ 1 (7) 




As this inequality is independent of x, finally we get: 

sup A (Cj (x + 7l r?j? + 72^') n < (a + ^) (v^)"" 1 (8) 



then 

_\ d-l 



sup A (t/f (x)) <(\I\- 1) (a + ^ (>/§)' 



Now consider x° G and <S(x°) a neighborhood of rc° included in a sphere of radius a. Let 
W(x°) be the set of u> remaining in their Voronoi cells when x° go to any x £ S(xq). For all 
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to G W(x°) we have 

inf x( z S (xO) 9x(u) > g x o(u) ~ Eje/ A (C^oV) ~j) [\\ x °j -^W 2 -iufseS(«°) H x j -^ll 2 ) 

/ _\ I") 



For all ijj G [0, 1] , for a small enough a, we have ( \\xj — u\\ — in£ x eS(x°) \\ x j ~ ^11) < 



2S|/| 



so 



/ ( — a; 1 1 2 — inf llx, — uj\\ 2 ) dP(uj) < - and / (fl,_o(a;) — inf fe(k/)l<~ 

(10) 

Now, let W / (x°) c be the set of ui changing of Voronoi cells when the centroids go from x° to 
x G S x o. For all uj G W(x°) c there exist two different indices i and j such that uj G Cj(x°) and 
u G Cj(x). Let us define a sequence x fc , fc G {0, • • • , ||/|}, by sequentially changing the components 
of x° into the components of x such that x' 7 ' = x (x k is the set of intermediate configurations 
to transform x° in x), then there exists a moment Z £ {0, • • • , |7| — 1}, such that uj G Cj(x ) and 
u> ^ Cj(x' +1 ). Indeed, if it were not the case, you could find a sequence x k , & G {0, • • • , ||/|}, with 
xl 7 ! = x such that u G Cj(xl 7 l) = Cj(x), which would be a contradiction. So W(x°) c is included in 
the set of u which change of Voronoi set when we change sequentially the components of x° by the 
components of x. 

If a < |, then when the components x® of x° are moving sequentially from x° to Xj of x, each 

6 

intermediate configuration stays in Df . Since, for all i G I, ||xj — uj\\ 2 is bounded by 1 on [0, l] d , 
the lemma I3TT1 assure that 



/ 



W(x°) c 



g x (u)dP(u)<B\I\(\I\-l))(^ + a^(V2y ' (11) 
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Finally, if we choose a small enough a such that B\I\(\I\ - 1)) + a) (y/2) < §, we get 

/ g x o(u)dP(u)-e< f ( inf g x (u)) dP{u) (12) 

JDj JDj \xeS(x°) J 

Exactly in the same way, for a small enough a, we get 

( I sup 9x (u) ) dP(u) < [ g x o{u)dP(uj) + e (13) 

JDj \xeS(xO) J JDj 

Therefore, the sufficient condition for the uniform law of large numbers is true. 



3.2 Consistency 

We want to show the consistency of the procedure involving choosing maps (x ra ) nG N* which almost 
minimizes the empirical distortions (V n (x)) n ^* in Dj. 
Let 

£ := \ x G Dj such that V n (x) < inf V n (x) + -J- 1 (14) 
[ xeDj p{n) J 

be the set of estimators that almost minimize the empirical distortion, with f3(n) being a strictly 
positive function, such that lirrin^+oo (3(n) = oo. Let \ = ar g m i n xeD| V( x ) be the set of maps 
minimizing the theoretical distortion, eventually reduced to one map. It is easy to verify that the 
function x i — ► V (x) is continuous on Dj, so for all neighborhood Af of x, v(A^) > exists such 
that 

Vx G Dj\M, V (x) > min F (x) + r? (AT) (If 



to show the strong consistency, it is enough to prove that for all neighborhoods jV of % we have 

lim xi °C ^ limy fe) - F (x) °< r? (A/") (16) 
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with V(E) - V (F) := sup{V (x) - V (y) for x 6 E and y G F}. By definition V n (x£\ °< 
(x) + /JT^J) ano - ^ e un if° rm l aw °f large numbers yields lim^^oo V n (x) — V (x) a = 0, we get 
then lmij^oo 



Vn ( Xn) < V(x)+ '^p-- Moreover, we have lim n ^oo V [Xn ) - V n ( Xn ) and 



i ( x2) - ^ * n ,ta v. (jeg) < v ca + 



(17) 



(o\ a.s. 
Xn) — V (x) < r](Jv), this proves the strong consistency of the maps which 

almost minimizes the empirical distortion. 



4 Differences between the SOM algorithm and distortion measure 

Using the result of the previous section we can investigate the differences between the minima 
of the empirical distortion and the equilibria of the SOM algorithm. Namely, if these equilibria 
were maps almost minimizing the empirical distortion criterion they will converge, as the number 
of observations increases, to the minimum of the theoretical distortion measure but we will show 
that it is not generally the case. In the next section we will compute the gradient of the function 
V(x), and show that even in multidimensional cases, the equilibria of the SOM algorithm and the 
minima of V(x) do not match. These results generalize the results of Kohonen [9j obtained for 
unidimensional cases. 

4.1 Derivability of V (x) 

Let us now write 



F>I 



iei 



e [0,1]" Vfc€{l,. ••,(*} 



> if i ^ j 
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For i and j G I, notes rf^ the vector 



and let 



M» :=: { u G M<7 ( « - ^^,x; - ^ ) = 



(19) 



be the mediator hyperplan. Let us note \£ (u>) the Lebesgue measure on M % J . Fort and Pages |3j, 
have shown the following lemma: 

Lemma 4.1 Let <f> be an M valued continuous function on [0, l] d . For x G -D/ ; /ei &e (x) := 
Ic (x) ^ ( Ll> ) < ^ J- ^ e no ^ e a ^ so ( e l) " " ) e rf) ^ e canonical base ofM. d . The function $j is continuously 
derivable on Dj and Vi ^ j, I E {!,-■■ ,d} 



dx\ 



Ci(x)nCj(x) 



^±^-c),e,>U^M^ (20) 



/ ^ (x) \ 



Moreover, if we note (x 



di 



dxi 



(x) 



(21) 



Then, we deduce the theorem: 



Theorem 4.2 7/P (do;) = / (uj) du, where f is continuous on [0; 1] , then V is continuously deriv- 
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able on Dj and we have 



dV 



7U- (x) = Ekei A (» - A:) J Ck(x) (xi -u)P (du) 
+ ^E je /E teW4 (A(fc-j)-A(,-i)) 



f(u) X k Jdio 



1 2 n x + \\x h -Wi\\ 



U) 



(22) 



where 



dV 
~5xl 



( dV_ 

dx} 



Proof As the function V (x) is continuous on Di, we only have to show that the partial derivatives 
exist and are continuous. We note h\ G IRl 7 !*^ the vector with all components null except the 
component corresponding to x\ , which is h > 0. Then 

V(x+h\)-V(x) _ 
h ~ 



+ 



+ 



+ 



iEfce/,fc^<A(fc-*)/ Cfc / +h Q||o:i+^-w|| P(dw)-/ Cfc(!c;) ||a!<-a)|| a P(dw) 

h 

5(/c i («+fcnll x<+ ^- w ir P ^-^G ( («)ll*i- w ll 2p ( < ^) 



(23) 



Where the first two lines of the sums concern centroids different from X{ and the last two lines the 
variation involving X{. Now, by applying the lemma l4~Tl to the first two lines of the sum we get: 
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lim^o = 2 22k,jel,k,tfi A ( k ~ J) 

Wft« ll*J " "if {5 e <) + * " -) ' e <>} # H ^ 

Wftw Ito -^H 2 {^ e <) + x - w ) ■ e <>} # (") duJ (24) 

lE fce j, fc/i A(fc-0/ c / +h nl|^-o;|| 2 +2/ l (^-^)+o(/ l )P(^)-/ Cfe{a;) !l^-a;|| 2 P(rfa;) 

+ lim^o h 

I (x+ h i) IN^II 2 + 2h (^-^)+°( fc ) p ( <| w)-/o 4 («)ll !B *- w ll 2p ( <iw )) 
+ ' V " s - 

Then, by applying the lemma I4TT1 to the last two lines, we get: 
lim^o V ^ +h f V{x) = I E k ,jei,k^i (A (* - i) - A (i - j)) 

Ito - -II 2 (** e <) + pAiir x " w ) ' e <>} A " 4 H dw 

+^ J2kei,k^i A(^ - *) 

N " -II 2 {l + X (C 2 ^ " <") < e <>} A ^ («") d - 

-I £fcew< /flK xWW N " -II 2 e <) + X " w ) > e <>} A " 4 ^ dw 

+ E* 6 jA(fc-<)/ 0fcW (xi-^)P((L;) 

finally 

lim^o = g (x) = I Ew>Jk/i (A (* - i) — A (i — ;)) 

/^WnftW INi " -II 2 e <) + x - , e z )} A« H du (26) 

+ E* g /A(fc-i)/ 0fc(a() ( a J i -t fl ')P(du;)B 



If we assume that the minimum of distortion measure is reached in the interior of Dj (i.e. that 
no centroids collapse), we deduce from the previous results that it does not match the equilibrium 
of the Kohonen algorithm. Indeed, a point x* := (x*) ie/ asymptotically stable for the Kohonen 
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algorithm will verify for all i G I: 

J2 A (i~ k ) [ (xi - uj) P (dco) = (27) 
fee/ •'Ckto 

This equation is valid even for the batch algorithm (see Fort, Cottrell and Letremy [4]). It can 
match with a minimum of the limit distortion only if 



i^E teW (A(k-j)-A(i-i)) 



(2E 



but, in general, this term is not null. 

4.2 Example of a Kohonen string with 3 centroids 

The previous section has shown that the minimum of distortion measure does not match the equi- 
librium of the Kohonen algorithm. We will illustrate this with a simple example. The classical 
explanation (see Kohonen [7]) of local potential minimization by the Kohonen algorithm is far from 
being satisfactory. Actually it seems that the minima of the distortion measure always occur on a 
discontinuity point, where the function is not derivable. 

To illustrate this, let a Kohonen string be on segment [0,1] (see figure [2]), with a discrete 
neighborhood 
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4.2.1 The theoretical difference 

The equilibrium of the SOM algorithm is reached on points x verifying 

^ = law (*i " ^ p (^) + /c a( x) (^i " w ) p M = 

(*) = Jc l( *) - P (dc) + f C2(x) (x 2 -u)P {du) + f Ca(x) (x 2 -u)P (du,) = (29) 
S (*) = /<*(,) 0»* - ") P (du) + J C3(x) (xs - u>) P (du) = 

but the minima of the distortion are reached on points x verifying 

fir (*) = taw ( Xl " w ) p (<M + /a ( x) to " p " i 11*3 - ^ f / (^) = o 

( x ) = /ci(x) ( x 2 - w) P (dw) + J C2(a;) (x 2 - w) P (dw) + f C3(x) (x 2 -u)P {duj) 

— I N'T — £1+21 II 2 f I x l+ x 2 \ I 1 II™ _ X3+X2 || 2 f / X3+X2 \ _ n 
4 ll X 3 2 II J V 2 / 4 ll Xl 2 II * V 2 / — U 

S (*) = Jc 2 (s) (*s - ") P M + / Cs (x) (x 3 -u)P (du,) + | ||xx - ^ f / (^) = 

If we assume, for example, that the density of observations is uniform W[o;i] > i- e - = 1 if x G [0; 1] , 
then these two sets of points have no point in common. Indeed, if the two sets are equal then 

x 3 - = 



(30) 



(31) 



Xl _X2±Zl =0 
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Therefore, x\ = X2 = x$, but this point is clearly not an equilibrium of the Kohonen map. 
4.2.2 Illustration of the behavior of distortion measure 

We will see that if one draws data with a uniform distribution on the segment [0, 1] and then one 
computes the minimum of the distortion, then this minimum is always on a discontinuity point. 
The more observations one has, the more discontinuities there are, but the global function looks 
more and more regular. This is not surprising, since we know that the limit is derivable. 

The method of simulation Since we have no numerical algorithm to compute the exact min- 
imum of variance, we proceed by exhaustive research based on a discretization of the space of the 
centroids. To avoid too much computation, 0.001 is chosen as the discretization step. The following 
figures are obtained in the following way: 

1. Simulate n "data" (u>i, • • • ,uj n ), chosen with a uniform law on [0, 1]. 

2. Search exhaustively, on the discretization of Dj, the string which minimizes the distortion. 

3. For the best string (a;*, x%, x%), the graphical representations are obtained in the following 

way: 

• 3D Representation: we keep one centroid in the triplet (x\, x%, x§), then we move the 
other around a small neighborhood of its optimal position. The level z is the extended 
variance multiplied by the number of observations n. 

• 2D Representation: we keep two centroids in the triplet (x*, x%), then we move the 
last one around a small neighborhood of its optimal position. The level z is the extended 
variance multiplied by the number of observations n. 
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The following figures show the results obtained for a number of observations n varying from 10, 
100 and 1000. We notice that, even for a small number of observations, the minima are always on 
discontinuity points. 

Figure 3: Distortion measure for 10 observations 
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5 Conclusion 

For a finite number of observations, the Kohonen algorithm was supposed to give an approximation 
of the minimum of distortion measure, but if it were the case, then why can the points of equilibrium 
of the algorithm be different from the theoretical minimum of distortion? Moreover, we have shown 
that if we choose maps that almost minimizes the empirical distortion, then these maps have to 
converge to the set of maps which minimize the theoretical distortion. But, by calculating the 
derivative of the theoretical distortion, we have shown that the equilibria of the Kohonen map can 
not minimize this distortion in general. We illustrate this fact with an example where the minimum 
is always reached on discontinuity points. This fact proves that the local derivability of distortion 
measure is not an important property and is not a satisfactory explanation for the behavior of the 
Kohonen algorithm when the number of observations is finite. 
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